HDDS-13891. SCM-based health monitoring and batch processing in Recon by devmadhuu · Pull Request #9258 · apache/ozone

devmadhuu · 2025-11-07T07:55:49Z

What changes were proposed in this pull request?

This PR Implements ContainerHealthTaskV2 by extending SCM's ReplicationManager for use in Recon. This approach evaluates container health locally using SCM's proven health check logic without requiring network communication between SCM and Recon.

Design
https://docs.google.com/document/d/1iea0eC4IpPa4Qpmc47Ae3KyneFCZ_fyyuhbZwqrR3cM/edit?pli=1&tab=t.0#heading=h.986yaoz7wnxv

Implementation Approach

Introduces ContainerHealthTaskV2, a new implementation that determines container health states by:

Extending SCM's ReplicationManager as ReconReplicationManager
Calling processAll() to evaluate all containers using SCM's proven health check logic

Container Health States Detected

ContainerHealthTaskV2 detects 5 distinct health states:

SCM Health States (Inherited)

MISSING - Container has no replicas available
UNDER_REPLICATED - Fewer replicas than required by replication config
OVER_REPLICATED - More replicas than required
MIS_REPLICATED - Replicas violate placement policy (rack/datanode distribution)

Recon-Specific Health State

REPLICA_MISMATCH - Container replicas have different data checksums, indicating:
- Bit rot (silent data corruption)
- Failed writes to some replicas
- Storage corruption on specific datanodes
- Network corruption during replication

Implementation: ReconReplicationManager first runs SCM's health checks, then additionally checks for REPLICA_MISMATCH by comparing checksums across replicas. This ensures both replication health and data integrity are monitored.

Testing

Build compiles successfully
Unit tests pass
Integration tests pass (failures are pre-existing flaky tests)
ContainerHealthTaskV2 runs successfully in test cluster
All containers evaluated correctly
All 5 health states (including REPLICA_MISMATCH) captured in UNHEALTHY_CONTAINERS table
No performance degradation observed
REPLICA_MISMATCH detection verified (same logic as legacy)

Database Schema

Uses existing UNHEALTHY_CONTAINERS_V2 table with support for all 5 health states:

MISSING - No replicas available
UNDER_REPLICATED - Insufficient replicas
OVER_REPLICATED - Excess replicas
MIS_REPLICATED - Placement policy violated
REPLICA_MISMATCH - Data checksum inconsistency across replicas

Each record includes:

Container ID
Health state
Expected vs actual replica counts
Replica delta (actual - expected)
Timestamp (in_state_since)
reason

Some code optimizations in this PR for Recon's ContainerHealthTask are done using Cursor AI tool.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13891

How was this patch tested?

Added junit test cases and tested using local docker cluster.

bash-5.1$ ozone admin container report
Container Summary Report generated at 2025-11-06T17:10:27Z
==========================================================

Container State Summary
=======================
OPEN: 0
CLOSING: 3
QUASI_CLOSED: 3
CLOSED: 0
DELETING: 0
DELETED: 0
RECOVERING: 0

Container Health Summary
========================
UNDER_REPLICATED: 1
MIS_REPLICATED: 0
OVER_REPLICATED: 0
MISSING: 3
UNHEALTHY: 0
EMPTY: 0
OPEN_UNHEALTHY: 0
QUASI_CLOSED_STUCK: 1
OPEN_WITHOUT_PIPELINE: 0

First 100 UNDER_REPLICATED containers:
#1

First 100 MISSING containers:
#3, #5, #6

First 100 QUASI_CLOSED_STUCK containers:
#1

bash-5.1$ ozone admin container report
Container Summary Report generated at 2025-11-06T17:11:42Z
==========================================================

Container State Summary
=======================
OPEN: 0
CLOSING: 2
QUASI_CLOSED: 1
CLOSED: 3
DELETING: 0
DELETED: 0
RECOVERING: 0

Container Health Summary
========================
UNDER_REPLICATED: 1
MIS_REPLICATED: 0
OVER_REPLICATED: 0
MISSING: 2
UNHEALTHY: 0
EMPTY: 0
OPEN_UNHEALTHY: 0
QUASI_CLOSED_STUCK: 1
OPEN_WITHOUT_PIPELINE: 0

First 100 UNDER_REPLICATED containers:
#1

First 100 MISSING containers:
#5, #6

First 100 QUASI_CLOSED_STUCK containers:
#1

bash-5.1$ ozone admin container report
Container Summary Report generated at 2025-11-06T17:12:42Z
==========================================================

Container State Summary
=======================
OPEN: 0
CLOSING: 2
QUASI_CLOSED: 1
CLOSED: 3
DELETING: 0
DELETED: 0
RECOVERING: 0

Container Health Summary
========================
UNDER_REPLICATED: 0
MIS_REPLICATED: 0
OVER_REPLICATED: 1
MISSING: 0
UNHEALTHY: 0
EMPTY: 0
OPEN_UNHEALTHY: 0
QUASI_CLOSED_STUCK: 0
OPEN_WITHOUT_PIPELINE: 0

First 100 OVER_REPLICATED containers:
#1

Recon `UNHEALTHY_CONTAINERS` Table — Performance Optimisations

Summary

This PR also improves the read throughput of the UNHEALTHY_CONTAINERS Derby table by
43–67×, fixes a latent ERROR XBCM4 crash that would occur on any cluster large
enough to trigger a >2,000-container DELETE in one statement, and removes a redundant
Java-side sort that was executing on every paginated API response.

Changes

1. New composite index — `ContainerSchemaDefinition.java`

Old index:

CREATE INDEX idx_container_state ON UNHEALTHY_CONTAINERS (container_state)

New index:

CREATE INDEX idx_state_container_id ON UNHEALTHY_CONTAINERS (container_state, container_id)

Why this matters for paginated reads:

With the old single-column index, Derby had to:

Scan all rows matching container_state = ? (up to 200K entries).
Sort those rows by container_id on every single page call — an O(n) operation repeated once per page.

With the composite index, Derby jumps directly to (state, minContainerId) and reads the next LIMIT entries sequentially — O(1) per page regardless of cursor position or total row count.

The composite index also covers COUNT(*) WHERE container_state = ? and GROUP BY container_state queries via its leading column prefix, so those queries retain their index-only access path.

2. `ContainerHealthSchemaManagerV2.getUnhealthyContainers()` — two fixes

a) Conditional Java-side sort removed for forward pagination

// Before — sort ran on every page even though SQL ORDER BY already returned ASC
return query.fetchInto(UnhealthyContainersRecord.class).stream()
    .sorted(Comparator.comparingLong(UnhealthyContainersRecord::getContainerId))
    ...

// After — sort kept only for the reverse-pagination path (maxContainerId > 0)
Stream<UnhealthyContainersRecord> stream = query.fetchInto(...).stream();
if (maxContainerId > 0) {
    stream = stream.sorted(...); // DESC→ASC flip for reverse cursor
}
return stream.map(...).collect(toList());

For the common forward-pagination path (minContainerId), the SQL ORDER BY container_id ASC
already delivers sorted rows. The redundant Java sort was calling
Comparator.comparingLong on every page (up to 200 pages per state per request).

b) JDBC fetch-size hint added

query.fetchSize(limit);   // pre-buffers `limit` rows per JDBC round-trip

Derby's default JDBC fetch size is 1 row per wire call. For a 5,000-row page this
meant 5,000 individual JDBC fetch round-trips inside the driver before any data reached
the application layer. Setting fetchSize(limit) pre-buffers the full page in a single
JDBC call.

3. `ContainerHealthSchemaManagerV2.batchDeleteSCMStatesForContainers()` — internal chunking

Bug fixed: Derby's SQL compiler generates a Java class per prepared statement.
A WHERE container_id IN (N values) predicate combined with the 7-state
container_state IN (…) predicate generates an expression tree whose compiled
bytecode can exceed the JVM 65,535-byte per-method limit (ERROR XBCM4).

IN-clause size	Bytecode generated	Within limit?
1,000 IDs	~30 KB	Yes
2,000 IDs	~60 KB	borderline
5,000 IDs	~148 KB	No

The method previously delegated chunking to callers. On a large cluster
ReconReplicationManager.persistUnhealthyRecords() passes the full container list
in one call — which would crash Derby on any cluster with >2,000 containers in a
single scan batch.

Fix: the method now chunks internally at MAX_DELETE_CHUNK_SIZE = 1,000 IDs per
SQL statement. Callers pass any size list; the method is safe by construction.

static final int MAX_DELETE_CHUNK_SIZE = 1_000;

for (int from = 0; from < containerIds.size(); from += MAX_DELETE_CHUNK_SIZE) {
    List<Long> chunk = containerIds.subList(from,
        Math.min(from + MAX_DELETE_CHUNK_SIZE, containerIds.size()));
    dslContext.deleteFrom(UNHEALTHY_CONTAINERS)
        .where(UNHEALTHY_CONTAINERS.CONTAINER_ID.in(chunk))
        .and(UNHEALTHY_CONTAINERS.CONTAINER_STATE.in(/* 7 states */))
        .execute();
}

Performance Test (Added in last commit)

A new test class TestUnhealthyContainersDerbyPerformance benchmarks all operations
at 1 million records (5 states × 200,000 container IDs).

Test environment: macOS 14 (Apple M-series), JDK 8, Derby 10.14 embedded,
derby.storage.pageCacheSize = 20,000 (~80 MB page cache).

Results — baseline vs. optimised

Operation	Baseline	Optimised	Improvement
INSERT 1 M records	18,843 ms · 53,070 rec/s	48,326 ms · 20,693 rec/s	2.6× slower
`COUNT(*)` total	250 ms	337 ms	–35%
`COUNT` by state (avg of 5)	67 ms	83 ms	–19%
`GROUP BY` summary (all states)	360 ms	653 ms	–45%
Paginated read — 1 state, 200K rows, 40 pages × 5K	41,963 ms · 4,766 rec/s	964 ms · 200,000 rec/s	43.5× faster
Full 1M read — all 5 states, paged	281,349 ms · 3,554 rec/s	4,175 ms · 239,521 rec/s	67× faster
DELETE 500K rows (100K IDs × 5 states)	48,681 ms · 10,271 rows/s	37,268 ms · 13,416 rows/s	1.3× faster
`COUNT` by state after delete (avg)	58 ms	73 ms	–21%

Raw logs

Optimised run output (click to expand)

[main] INFO  - === Derby Performance Benchmark — Setup ===
[main] INFO  - Dataset: 5 states × 200000 container IDs = 1000000 total records
[main] INFO  - Starting bulk INSERT: 1000000 records  (2000 containers/tx, 100 transactions)
[main] INFO  - INSERT complete: 1000000 records in 48326 ms  (20693 rec/sec, 100 tx)
[main] INFO  - --- Test 1: Verify total row count = 1000000 ---
[main] INFO  - COUNT(*) = 1000000 rows in 337 ms
[main] INFO  - --- Test 2: COUNT(*) by state (index-covered, 200000 records each) ---
[main] INFO  -   COUNT(UNDER_REPLICATED)  = 200000 rows in 111 ms
[main] INFO  -   COUNT(MISSING)           = 200000 rows in  78 ms
[main] INFO  -   COUNT(OVER_REPLICATED)   = 200000 rows in  72 ms
[main] INFO  -   COUNT(MIS_REPLICATED)    = 200000 rows in 102 ms
[main] INFO  -   COUNT(EMPTY_MISSING)     = 200000 rows in  51 ms
[main] INFO  - --- Test 3: GROUP BY summary over 1000000 rows ---
[main] INFO  - GROUP BY summary: 5 state groups returned in 653 ms
[main] INFO  - --- Test 4: Paginated read of UNDER_REPLICATED (200000 records, page size 5000) ---
[main] INFO  - Paginated read: 200000 records in 40 pages, 964 ms  (200000 rec/sec)
[main] INFO  - --- Test 5: Full 1 M record read (all states, paged) ---
[main] INFO  -   State UNDER_REPLICATED:  200000 records in   989 ms
[main] INFO  -   State MISSING:           200000 records in   785 ms
[main] INFO  -   State OVER_REPLICATED:   200000 records in   709 ms
[main] INFO  -   State MIS_REPLICATED:    200000 records in   805 ms
[main] INFO  -   State EMPTY_MISSING:     200000 records in   876 ms
[main] INFO  - Full dataset read: 1000000 total records in 4175 ms  (239521 rec/sec)
[main] INFO  - --- Test 6: Batch DELETE — 100000 IDs × 5 states = 500000 rows (100 SQL statements of 1000 IDs) ---
[main] INFO  - DELETE complete: 100000 IDs (500000 rows) in 37268 ms via 100 SQL statements  (13416 rows/sec)
[main] INFO  - Rows remaining after delete: 500000 (expected 500000)
[main] INFO  - --- Test 7: COUNT by state after 50% delete (expected 100000 each) ---
[main] INFO  -   COUNT(UNDER_REPLICATED)  = 100000 rows in 79 ms
[main] INFO  -   COUNT(MISSING)           = 100000 rows in 76 ms
[main] INFO  -   COUNT(OVER_REPLICATED)   = 100000 rows in 90 ms
[main] INFO  -   COUNT(MIS_REPLICATED)    = 100000 rows in 62 ms
[main] INFO  -   COUNT(EMPTY_MISSING)     = 100000 rows in 58 ms

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTaskV2.java

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java

…to run the replication manager logic in Recon itself.

…tion.

dombizita

Thank you for working on this @devmadhuu! This is quite big PR, I went through mostly the REPLICA_MISMATCH related changes (which look good), but tried to go over the whole change set, except the test changes.
Two overall comments: I believe you have a design document for this solution, please share that on the jira and the PR description too; not sure if you used any kind of AI tool, if yes please mention it in the description (as ASF asks for that). Thanks!

.../recon/src/main/java/org/apache/hadoop/ozone/recon/api/types/UnhealthyContainerMetadata.java

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManager.java

.../src/main/java/org/apache/hadoop/ozone/recon/persistence/ContainerHealthSchemaManagerV2.java

hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/api/TestContainerEndpoint.java

...ne/recon/src/test/java/org/apache/hadoop/ozone/recon/persistence/AbstractReconSqlDBTest.java

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTaskV2.java

sumitagrawl · 2026-03-05T05:38:59Z

...ne/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManagerReport.java

+   * @param container The container ID to record
+   */
+  @Override
+  public void incrementAndSample(ContainerHealthState stat, ContainerInfo container) {


we can remove other methods, just have 2 method for replica-mismatch state.

As discussed, since Recon replication manager is maintaining its map, so other methods are needed in this class.

.../recon/src/main/java/org/apache/hadoop/ozone/recon/metrics/ContainerHealthTaskV2Metrics.java

.../src/main/java/org/apache/hadoop/ozone/recon/persistence/ContainerHealthSchemaManagerV2.java

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManager.java

ArafatKhan2198

Some comments on the patch @devmadhuu

ArafatKhan2198 · 2026-03-05T13:56:48Z

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManager.java

+      Set<Long> negativeSizeRecorded,
+      ProcessingStats stats) throws ContainerNotFoundException {
+    switch (state) {
+    case MISSING:


SCM's handler chain can emit composite health states like:

UNHEALTHY_UNDER_REPLICATED QUASI_CLOSED_STUCK_MISSING QUASI_CLOSED_STUCK_UNDER_REPLICATED MISSING_UNDER_REPLICATED

etc.

But the switch statement in handleScmStateContainer() only handles 4 states: MISSING, UNDER_REPLICATED, OVER_REPLICATED, MIS_REPLICATED. Everything else falls into default: break; and is silently thrown away.

This means a container that is both quasi-closed-stuck AND has no replicas (QUASI_CLOSED_STUCK_MISSING) will never appear in the Recon UI or the UNHEALTHY_CONTAINERS table.

Are these composite states intentionally excluded from V2? or should we map them to their base state (e.g., QUASI_CLOSED_STUCK_UNDER_REPLICATED → store as UNDER_REPLICATED with an appropriate reason string).

This is good point, we can map these composites into storable base states in Recon without changing DB enum/schema:

UNHEALTHY_UNDER_REPLICATED -> UNDER_REPLICATED UNHEALTHY_OVER_REPLICATED -> OVER_REPLICATED QUASI_CLOSED_STUCK_MISSING -> MISSING QUASI_CLOSED_STUCK_UNDER_REPLICATED -> UNDER_REPLICATED QUASI_CLOSED_STUCK_OVER_REPLICATED -> OVER_REPLICATED MISSING_UNDER_REPLICATED -> both MISSING and UNDER_REPLICATED

This keeps compatibility with current UNHEALTHY_CONTAINERS allowed states and avoids silent loss.

ArafatKhan2198 · 2026-03-05T13:58:00Z

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManager.java

+      handleMissingContainer(containerId, currentTime,
+          existingInStateSinceByContainerAndState, recordsToInsert, stats);
+      break;
+    case UNDER_REPLICATED:


Shouldn't we have a state for REPLICA_MISMATCH also ?

Good catch. Yes it will be added.

ArafatKhan2198 · 2026-03-05T14:00:12Z

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManager.java

+      List<UnhealthyContainerRecordV2> recordsToInsert,
+      Set<Long> negativeSizeRecorded,
+      ProcessingStats stats) throws ContainerNotFoundException {
+    switch (state) {


Remember, in our offline discussion we tried checking what happens if we attempt to add a state to the Derby table that violates the allowed-state constraint. I believe this switch case will prevent a new state from being added to the database.

Yes, switch case will prevent a new state from being added to database and unsupported/new SCM states will be detected and logged (not silently dropped), while DB writes remain constrained to Recon’s allowed enum states only.

ArafatKhan2198 · 2026-03-05T14:01:36Z

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManager.java

+    healthSchemaManager.batchDeleteSCMStatesForContainers(containerIdsToDelete);
+
+    LOG.info("Inserting {} unhealthy container records", recordsToInsert.size());
+    healthSchemaManager.insertUnhealthyContainerRecords(recordsToInsert);
+  }


healthSchemaManager.batchDeleteSCMStatesForContainers(containerIdsToDelete); // step 1 healthSchemaManager.insertUnhealthyContainerRecords(recordsToInsert); // step 2

These two calls are not in the same transaction. If Recon crashes between step 1 and step 2, all old health data is gone but the new data was never written. The API would return 0 unhealthy containers until the next scan runs. Also, if someone queries the API between step 1 and step 2, they get empty or partial results.

Yes, This will be handled as part of above same comment from Sumit.

ArafatKhan2198 · 2026-03-05T14:07:35Z

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManager.java

+
+    for (int from = 0; from < allContainers.size(); from += PERSIST_CHUNK_SIZE) {
+      int to = Math.min(from + PERSIST_CHUNK_SIZE, allContainers.size());
+      List<Long> chunkContainerIds = collectContainerIds(allContainers, from, to);


This collects every single container ID in the cluster (healthy and unhealthy) and runs DELETE statements for all of them. On a cluster with 1 million containers, that means:

Allocating a list with 1M entries

Running 1,000 chunked DELETE statements

Most of those containers are healthy and have no rows in the table, so the DELETEs are wasted work

Suggestion: Instead, just delete all rows by state: DELETE FROM UNHEALTHY_CONTAINERS WHERE container_state IN (...) — one statement, no chunking, much faster.

We no longer delete by passing all container IDs blindly. For each chunk, we first load existing rows and build existingContainerIdsToDelete, so DELETE is issued only for container IDs that actually have persisted unhealthy rows in DB. We also process in bounded chunks (PERSIST_CHUNK_SIZE), so we do not hold a single 1M containerIDs list at once.
Another reason behind switch to DELETE ... WHERE container_state IN (...) because that would clear all unhealthy rows globally before reinsert which will also remove the in_state_since replica history. Check @dombizita comments.

ArafatKhan2198 · 2026-03-05T14:16:45Z

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManager.java

+
+        // Call inherited processContainer - this runs SCM's health check chain
+        // readOnly=true ensures no commands are generated
+        processContainer(container, nullQueue, report, true);


Minor Suggestion -

processContainer(container, nullQueue, report, true); // calls getContainerReplicas() internally Set<ContainerReplica> replicas = containerManager.getContainerReplicas(cid); // calls again

Every container's replicas are fetched twice once inside the inherited processContainer() and once for the REPLICA_MISMATCH check. On a 1 million container cluster, that's 2M replica lookups. We can Extract replicas once and pass them to both operations.

Good catch. This one is worth a real code change, not just a reply — I’ll remove the duplicate replica fetch by adding a ReplicationManager overload that accepts pre-fetched replicas and then use it from Recon.

ArafatKhan2198 · 2026-03-05T14:26:18Z

...one/recon-codegen/src/main/java/org/apache/ozone/recon/schema/ContainerSchemaDefinition.java

+    dslContext.createIndex("idx_state_container_id")
+        .on(DSL.table(UNHEALTHY_CONTAINERS_TABLE_NAME),
+            DSL.field(name(CONTAINER_STATE)),
+            DSL.field(name(CONTAINER_ID)))


if (!TABLE_EXISTS_CHECK.test(conn, UNHEALTHY_CONTAINERS_TABLE_NAME)) { createUnhealthyContainersTable(); // creates table + composite index }

The composite index idx_state_container_id is created inside createUnhealthyContainersTable(). This method is only called if the table doesn't exist.

If someone upgrades an existing Recon deployment, the UNHEALTHY_CONTAINERS table already exists (from V1), so this entire method is skipped. The new composite index is never created on existing clusters they're stuck with the old single-column index and get none of the 43–67× performance improvement.

Handled using new upgrade action.

sumitagrawl

Given few comments

...op-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ReconReplicationManager.java

...on/src/main/java/org/apache/hadoop/ozone/recon/persistence/ContainerHealthSchemaManager.java

sumitagrawl · 2026-03-23T05:07:32Z

.../java/org/apache/hadoop/ozone/recon/persistence/TestUnhealthyContainersDerbyPerformance.java

+    long now = System.currentTimeMillis();
+    long insertStart = System.nanoTime();
+
+    for (int startId = 1; startId <= CONTAINER_ID_RANGE; startId += CONTAINERS_PER_TX) {


Current code, check if can optimize or change code
start tx

delete .5 million
loop 1 million batch:

batch insert 1k
end loop
end tx

Updated the tests. Below is perf data:

2026-03-23 17:47:28,930 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testBatchInsertOneMillionRecords(290)) - --- Test 1: Batch INSERT 1000000 records (2000 containers/tx, 100 transactions) --- 2026-03-23 17:47:43,984 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testBatchInsertOneMillionRecords(304)) - Batch INSERT complete: 1000000 records in 15037 ms (66503 rec/sec, 100 tx) 2026-03-23 17:47:43,993 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testTotalInsertedRecordCountIsOneMillion(319)) - --- Test 2: Verify total row count = 1000000 --- 2026-03-23 17:47:44,267 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testTotalInsertedRecordCountIsOneMillion(325)) - COUNT(*) = 1000000 rows in 274 ms 2026-03-23 17:47:44,270 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStatePerformanceUsesIndex(345)) - --- Test 3: COUNT(*) by state (index-covered, 200000 records each) --- 2026-03-23 17:47:44,361 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStatePerformanceUsesIndex(359)) - COUNT(UNDER_REPLICATED) = 200000 rows in 91 ms 2026-03-23 17:47:44,495 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStatePerformanceUsesIndex(359)) - COUNT(MISSING) = 200000 rows in 133 ms 2026-03-23 17:47:44,624 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStatePerformanceUsesIndex(359)) - COUNT(OVER_REPLICATED) = 200000 rows in 128 ms 2026-03-23 17:47:44,687 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStatePerformanceUsesIndex(359)) - COUNT(MIS_REPLICATED) = 200000 rows in 62 ms 2026-03-23 17:47:44,722 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStatePerformanceUsesIndex(359)) - COUNT(EMPTY_MISSING) = 200000 rows in 33 ms 2026-03-23 17:47:44,723 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testGroupBySummaryQueryPerformance(385)) - --- Test 4: GROUP BY summary over 1000000 rows --- 2026-03-23 17:47:45,271 [ForkJoinPool-1-worker-1] INFO impl.Tools (JooqLogger.java:info(338)) - Kotlin is available, but not kotlin-reflect. Add the kotlin-reflect dependency to better use Kotlin features like data classes 2026-03-23 17:47:45,272 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testGroupBySummaryQueryPerformance(392)) - GROUP BY summary: 5 state groups returned in 548 ms 2026-03-23 17:47:45,272 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:lambda$testGroupBySummaryQueryPerformance$0(395)) - state=EMPTY_MISSING count=200000 2026-03-23 17:47:45,273 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:lambda$testGroupBySummaryQueryPerformance$0(395)) - state=MISSING count=200000 2026-03-23 17:47:45,273 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:lambda$testGroupBySummaryQueryPerformance$0(395)) - state=MIS_REPLICATED count=200000 2026-03-23 17:47:45,273 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:lambda$testGroupBySummaryQueryPerformance$0(395)) - state=OVER_REPLICATED count=200000 2026-03-23 17:47:45,273 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:lambda$testGroupBySummaryQueryPerformance$0(395)) - state=UNDER_REPLICATED count=200000 2026-03-23 17:47:45,274 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testPaginatedReadByStatePerformance(431)) - --- Test 5: Paginated read of UNDER_REPLICATED (200000 records, page size 5000) --- 2026-03-23 17:47:45,779 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testPaginatedReadByStatePerformance(470)) - Paginated read: 200000 records in 40 pages, 504 ms (200000 rec/sec) 2026-03-23 17:47:45,781 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testFullDatasetReadThroughputAllStates(496)) - --- Test 6: Full 1 M record read (all states, paged) --- 2026-03-23 17:47:46,155 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testFullDatasetReadThroughputAllStates(524)) - State UNDER_REPLICATED: 200000 records in 373 ms 2026-03-23 17:47:46,532 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testFullDatasetReadThroughputAllStates(524)) - State MISSING: 200000 records in 376 ms 2026-03-23 17:47:46,915 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testFullDatasetReadThroughputAllStates(524)) - State OVER_REPLICATED: 200000 records in 382 ms 2026-03-23 17:47:47,275 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testFullDatasetReadThroughputAllStates(524)) - State MIS_REPLICATED: 200000 records in 359 ms 2026-03-23 17:47:47,647 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testFullDatasetReadThroughputAllStates(524)) - State EMPTY_MISSING: 200000 records in 371 ms 2026-03-23 17:47:47,648 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testFullDatasetReadThroughputAllStates(532)) - Full dataset read: 1000000 total records in 1865 ms (536193 rec/sec) 2026-03-23 17:47:47,650 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testAtomicReplaceDeleteAndInsertInSingleTransaction(562)) - --- Test 7: Atomic replace — 200000 IDs × 5 states = 1000000 rows in one tx --- 2026-03-23 17:49:06,774 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testAtomicReplaceDeleteAndInsertInSingleTransaction(575)) - Atomic replace completed in 79103 ms 2026-03-23 17:49:07,003 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testBatchDeletePerformanceOneMillionRecords(626)) - --- Test 8: Batch DELETE — 200000 IDs × 5 states = 1000000 rows (200 internal SQL statements of 1000 IDs) --- 2026-03-23 17:50:11,884 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testBatchDeletePerformanceOneMillionRecords(644)) - DELETE complete: 200000 IDs (1000000 rows) in 64881 ms via 200 SQL statements (15413 rows/sec) 2026-03-23 17:50:11,917 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testBatchDeletePerformanceOneMillionRecords(649)) - Rows remaining after delete: 0 (expected 0) 2026-03-23 17:50:11,918 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStateAfterFullDelete(672)) - --- Test 9: COUNT by state after full delete (expected 0 each) --- 2026-03-23 17:50:11,934 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStateAfterFullDelete(686)) - COUNT(UNDER_REPLICATED) = 0 rows in 15 ms 2026-03-23 17:50:11,951 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStateAfterFullDelete(686)) - COUNT(MISSING) = 0 rows in 16 ms 2026-03-23 17:50:11,966 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStateAfterFullDelete(686)) - COUNT(OVER_REPLICATED) = 0 rows in 15 ms 2026-03-23 17:50:11,968 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStateAfterFullDelete(686)) - COUNT(MIS_REPLICATED) = 0 rows in 1 ms 2026-03-23 17:50:11,970 [ForkJoinPool-1-worker-1] INFO persistence.TestUnhealthyContainersDerbyPerformance (TestUnhealthyContainersDerbyPerformance.java:testCountByStateAfterFullDelete(686)) - COUNT(EMPTY_MISSING) = 0 rows in 1 ms

sumitagrawl

LGTM

ArafatKhan2198

Thanks for the changes @devmadhuu
LGTM +1

devmadhuu changed the title ~~Hdds 13891~~ HDDS-13891. Recon - Introduce ContainerHealthTaskV2 with SCM-based health monitoring and batch processing Nov 7, 2025

devmadhuu force-pushed the HDDS-13891 branch from 94f642c to c523115 Compare November 7, 2025 08:03

Devesh Kumar Singh added 9 commits November 7, 2025 13:35

HDDS-13891

426164d

HDDS-13891

04fa39e

HDDS-13891

f065e91

HDDS-13891. PMD issues

2b6de75

HDDS-13891. checkstyle issues

b0f77aa

HDDS-13891. Test code

844891e

HDDS-13891. Build issue fixed

34a9609

HDDS-13891. Batch processing for performance improvement.

587638a

HDDS-13891. Fixing failing tests.

6da44f3

devmadhuu force-pushed the HDDS-13891 branch from c523115 to 6da44f3 Compare November 7, 2025 08:06

devmadhuu requested review from ArafatKhan2198, dombizita and sumitagrawl November 7, 2025 08:07

HDDS-13891. Fixed failing test

1be4e3f

devmadhuu marked this pull request as ready for review November 7, 2025 11:42

adoroszlai changed the title ~~HDDS-13891. Recon - Introduce ContainerHealthTaskV2 with SCM-based health monitoring and batch processing~~ HDDS-13891. SCM-based health monitoring and batch processing in Recon Nov 8, 2025

adoroszlai added the recon label Nov 8, 2025

devmadhuu requested a review from sodonnel November 11, 2025 15:31

jasonosullivan34 reviewed Nov 12, 2025

View reviewed changes

hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTaskV2.java Outdated Show resolved Hide resolved

jasonosullivan34 reviewed Nov 12, 2025

View reviewed changes

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java Outdated Show resolved Hide resolved

devmadhuu marked this pull request as draft November 21, 2025 07:30

Devesh Kumar Singh added 5 commits November 21, 2025 13:01

HDDS-13891. Updated solution implementation based on refactored code …

0a2a29b

…to run the replication manager logic in Recon itself.

HDDS-13891. Updated solution implementation based on refactored code …

d2d0141

…to run the replication manager logic in Recon itself.

HDDS-13891. Fixed failed tests and wrote new tests.

d2990f8

HDDS-13891. Fixed findbugs errors.

96578c5

HDDS-13891. Fixed failed test cases.

37ca75e

devmadhuu marked this pull request as ready for review November 24, 2025 03:48

HDDS-13891. Remove unused old code.

174a55f

Devesh Kumar Singh added 4 commits February 28, 2026 23:37

HDDS-13891. Failed tests.

1f0d5a3

HDDS-13891. Remove formatted change from ozone-default.xml.

65b1737

HDDS-13891. Remove formatted changes.

294ed28

HDDS-13891. Fixed build error.

0244599

devmadhuu marked this pull request as ready for review March 1, 2026 09:00

HDDS-13891. Added performance tests and results of test in PR descrip…

23e9667

…tion.

dombizita reviewed Mar 4, 2026

View reviewed changes

HDDS-13891. Fixed review comments.

4882fe5

devmadhuu requested a review from dombizita March 4, 2026 17:48

sumitagrawl reviewed Mar 5, 2026

View reviewed changes

ArafatKhan2198 reviewed Mar 5, 2026

View reviewed changes

HDDS-13891. Fixed review comments.

2a13be0

devmadhuu requested review from ArafatKhan2198 and sumitagrawl March 9, 2026 16:08

devmadhuu marked this pull request as draft March 9, 2026 16:16

Devesh Kumar Singh added 2 commits March 9, 2026 21:49

HDDS-13891. Fixed review comments.

1af2f32

HDDS-13891. Fixed failed robot tetss.

e036902

devmadhuu marked this pull request as ready for review March 10, 2026 05:41

HDDS-13891. Fixed review comments.

72a1b67

sumitagrawl reviewed Mar 18, 2026

View reviewed changes

sumitagrawl reviewed Mar 23, 2026

View reviewed changes

Devesh Kumar Singh added 2 commits March 23, 2026 11:12

Merge remote-tracking branch 'origin/master' into HDDS-13891

75f5544

HDDS-13891. Fixed review comments.

775e282

devmadhuu requested a review from sumitagrawl March 23, 2026 14:02

sumitagrawl approved these changes Mar 23, 2026

View reviewed changes

ArafatKhan2198 approved these changes Mar 23, 2026

View reviewed changes

HDDS-13891. Fixed failed test.

548b0f7

devmadhuu requested review from ArafatKhan2198 and sumitagrawl March 24, 2026 08:40

sumitagrawl merged commit de1b352 into apache:master Mar 24, 2026
45 checks passed

Conversation

devmadhuu commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Container Health States Detected

SCM Health States (Inherited)

Recon-Specific Health State

Testing

Database Schema

Each record includes:

What is the link to the Apache JIRA

How was this patch tested?

Recon UNHEALTHY_CONTAINERS Table — Performance Optimisations

Summary

Changes

1. New composite index — ContainerSchemaDefinition.java

2. ContainerHealthSchemaManagerV2.getUnhealthyContainers() — two fixes

a) Conditional Java-side sort removed for forward pagination

b) JDBC fetch-size hint added

3. ContainerHealthSchemaManagerV2.batchDeleteSCMStatesForContainers() — internal chunking

Performance Test (Added in last commit)

Results — baseline vs. optimised

Raw logs

Uh oh!

Uh oh!

Uh oh!

dombizita left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArafatKhan2198 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

devmadhuu commented Nov 7, 2025 •

edited

Loading

Recon `UNHEALTHY_CONTAINERS` Table — Performance Optimisations

1. New composite index — `ContainerSchemaDefinition.java`

2. `ContainerHealthSchemaManagerV2.getUnhealthyContainers()` — two fixes

3. `ContainerHealthSchemaManagerV2.batchDeleteSCMStatesForContainers()` — internal chunking